University of Texas at San Antonio



**Open Cloud Institute**


Machine Learning/BigData EE-6973-001-Fall-2016


**Paul Rad, Ph.D.**

**Gonzalo De La Torre, Ph.D. Student in EE**
**Vivek Sarkale, M.S. Student in CS**



A Machine Learning Approach Towards Anomalous Traffic Detection in Web Applications


Gonzalo De La Torre, Vivek Sarkale
*Open Cloud Institute, University of Texas at San Antonio, San Antonio, Texas, USA*
gonzalo.delatorreparra@utsa.edu, prn180@utsa.edu


Project Definition

The following project proposes the development of an anomaly-based intrusion detection system by creating a machine learning model using normal traffic which is sent from a client web browser to a web server. Traffic is monitored at the application layer (http protocol) and identifies anomalous traffic when new http requests deviate by a defined threshold set in the model.

Intrusion Detection Systems (IDS) are used to analyze network traffic to detect malicious actions or behaviors that can compromise sensible data or the security of a computer system. Typically, they are classified as signature-based (negative approach) or anomaly based (positive approach). Signature Detection System compare signatures of incoming traffic with signatures of known attacks saved in a database. On the other hand, Anomaly Detection Systems build a model of network traffic based on what is considered normal traffic. Afterwards, the model is used to monitor incoming traffic and any traffic deviating from the “normal” behavior is classified as anomalous.

As we want to be capable of detecting new attacks, the intrusion detection system to be developed will be anomaly-based.

Outcome

An RNN model will developed by training a series of network traffic sessions incoming from a client web browser and each session containing multiple requests of normal traffic. The RNN model will then be tested against normal and abnormal traffic identifying if the monitored traffic is normal or abnormal.

Dataset

This project uses the HTTP CSIC 2010 dataset developed at the "Information Security Institute" of CSIC (Spanish Research National Council). The dataset contains thousands of HTTP labeled requests targeted to an e- Commerce website. In these requests, users add items to shopping cart, register and provide personal information. In total, the dataset contains more than 36,000 normal requests and 25,000 anomalous requests.

The anomalous requests carry static attacks, dynamic attacks, and unintentional illegal requests that were generated using Paros and W3AF. The following presents a list of attacks that can be found in the dataset:

  • Obsolete file existence
  • Default file or example file existence
  • HTTP method validity
  • CRLF injection
  • Failure to restrict URL access
  • Invalid parameters
  • Command injection
  • Cross site scripting
  • SQL injection
  • Buffer overflows
  • Broken authentication and session management
  • Broken access control
  • Remote administration flaws
  • Web application and server misconfiguration
  • Malicious file execution
  • Insecure direct object reference
  • Information leakage and improper error handling

Dataset source: http://www.isi.csic.es/dataset/